Bai S, Kolter J Z, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[J]. arXiv preprint arXiv:1803.01271, 2018.

1. Overview

In this paper, it demonstrated that simple CNN (TCN) outperforms RNN, such as LSTM

TCN. longer memory, more accurate, simper
RNN. not paralleled

1.1. Background

1.1.1. Application

part-of-speech tagging and semantic role labelling
sentence classification
document classification
machine translation
audio synthesis
language modeling

1.1.2. Architecture

LSTM
GRU
ConvLSTM
Quasi-RNN
dilated RNN

2. TCN

2.1. Architecture

input. sequence of any length
output sequence of the same length
no gating mechanism
longer memory
1D Conv with padding
receptive fileds
dilation factor d
kernel size k
network depth n
d=2^n to make sure to hit each input within the effective history

2.2. Advantage

parallelism
flexible receptive filed size
stable gradient
low memory requirement for training
variable length inputs

2.3. Disadvantage

data storage during evaluation
potential parameter change for a transfer of domain

3. Experiments

3.1. Details

gradient clipping helped convergence [0.3, 1]
find that TCN insensitive to hyperparameter changes, as long as the effective history size is sufficient

(CVPR 2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling

1. Overview

1.1. Background

1.1.1. Application

1.1.2. Architecture

2. TCN

2.1. Architecture

2.2. Advantage

2.3. Disadvantage

3. Experiments

3.1. Details

3.2. Comparison

3.3. Adding Problem

3.4. Controlled Experiments